efficient deep learning
Scaling Up Quantization-Aware Neural Architecture Search for Efficient Deep Learning on the Edge
Lu, Yao, Rodriguez, Hiram Rayo Torres, Vogel, Sebastian, van de Waterlaat, Nick, Jancura, Pavol
Neural Architecture Search (NAS) has become the de-facto approach for designing accurate and efficient networks for edge devices. Since models are typically quantized for edge deployment, recent work has investigated quantization-aware NAS (QA-NAS) to search for highly accurate and efficient quantized models. However, existing QA-NAS approaches, particularly few-bit mixed-precision (FB-MP) methods, do not scale to larger tasks. Consequently, QA-NAS has mostly been limited to low-scale tasks and tiny networks. In this work, we present an approach to enable QA-NAS (INT8 and FB-MP) on large-scale tasks by leveraging the block-wise formulation introduced by block-wise NAS. We demonstrate strong results for the semantic segmentation task on the Cityscapes dataset, finding FB-MP models 33% smaller and INT8 models 17.6% faster than DeepLabV3 (INT8) without compromising task performance.
- Europe > Netherlands > North Brabant > Eindhoven (0.05)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
GitHub - facebookresearch/d2go: D2Go is a toolkit for efficient deep learning
D2Go is a production ready software system from FacebookResearch, which supports end-to-end model training and deployment for mobile platforms. Install PyTorch Nightly (use CUDA 10.2 as example, see details at PyTorch Website): See our model zoo for example configs and pretrained models. D2Go is released under the Apache 2.0 license.
Efficient Deep Learning Using Non-Volatile Memory Technology
Embedded machine learning (ML) systems have now become the dominant platform for deploying ML serving tasks and are projected to become of equal importance for training ML models. With this comes the challenge of overall efficient deployment, in particular low power and high throughput implementations, under stringent memory constraints. In this context, non-volatile memory (NVM) technologies such as STT-MRAM and SOT-MRAM have significant advantages compared to conventional SRAM due to their non-volatility, higher cell density, and scalability features. While prior work has investigated several architectural implications of NVM for generic applications, in this work we present DeepNVM, a comprehensive framework to characterize, model, and analyze NVM-based caches in GPU architectures for deep learning (DL) applications by combining technology-specific circuit-level models and the actual memory behavior of various DL workloads. DeepNVM relies on iso-capacity and iso-area performance and energy models for last-level caches implemented using conventional SRAM and emerging STT-MRAM and SOT-MRAM technologies.
Efficient Deep Learning: From Theory to Practice
Modern machine learning often relies on deep neural networks that are prohibitively expensive in terms of the memory and computational footprint. This in turn significantly inhibits the potential range of applications where we are faced with non-negligible resource constraints, e.g., real-time data processing, embedded devices, and robotics. In this thesis, we develop theoretically-grounded algorithms to reduce the size and inference cost of modern, large-scale neural networks. By taking a theoretical approach from first principles, we intend to understand and analytically describe the performance-size trade-offs of deep networks, i.e., the generalization properties. We then leverage such insights to devise practical algorithms for obtaining more efficient neural networks via pruning or compression. Beyond theoretical aspects and the inference time efficiency of neural networks, we study how compression can yield novel insights into the design and training of neural networks. We investigate the practical aspects of the generalization properties of pruned neural networks beyond simple metrics such as test accuracy. Finally, we show how in certain applications pruning neural networks can improve the training and hence the generalization performance.
Decoding Efficient Deep Learning- Path to Smaller, Faster, and Better Models
In today's tech savvy world, it is famously said that predicting the future isn't magic, it's called Artificial Intelligence, and Data is essentially the new science behind it! The field of Machine Learning and Artificial Intelligence at large is evolving at a tremendous pace. Every researcher is trying to achieve the best models and beat benchmarks continuously. In the real world, when a model is deployed, a lot of focus has to be spent on analyzing whether deep learning models can be efficiently scaled for people who might not have millions of dollars to train the models and gigantic machines to deploy their models. Deep learning and Artificial Intelligence have empowered humans to essentially find a needle in a haystack, however let's dive deeper into how this science can be made more accessible by making it more efficient.
Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better
Deep Learning has revolutionized the fields of computer vision, natural language understanding, speech recognition, information retrieval and more. However, with the progressive improvements in deep learning models, their number of parameters, latency, resources required to train, etc. have all have increased significantly. Consequently, it has become important to pay attention to these footprint metrics of a model as well, not just its quality. We present and motivate the problem of efficiency in deep learning, followed by a thorough survey of the five core areas of model efficiency (spanning modeling techniques, infrastructure, and hardware) and the seminal work there. We also present an experiment-based guide along with code, for practitioners to optimize their model training and deployment. We believe this is the first comprehensive survey in the efficient deep learning space that covers the landscape of model efficiency from modeling techniques to hardware support. Our hope is that this survey would provide the reader with the mental model and the necessary understanding of the field to apply generic efficiency techniques to immediately get significant improvements, and also equip them with ideas for further research and experimentation to achieve additional gains.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Massachusetts > Middlesex County > Reading (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (6 more...)
- Overview (0.68)
- Research Report (0.50)
- Education (0.93)
- Information Technology > Services (0.93)
Efficient Deep Learning of GMMs
Jalali, Shirin, Nuzman, Carl, Saniee, Iraj
We show that a collection of Gaussian mixture models (GMMs) in $R^{n}$ can be optimally classified using $O(n)$ neurons in a neural network with two hidden layers (deep neural network), whereas in contrast, a neural network with a single hidden layer (shallow neural network) would require at least $O(\exp(n))$ neurons or possibly exponentially large coefficients. Given the universality of the Gaussian distribution in the feature spaces of data, e.g., in speech, image and text, our result sheds light on the observed efficiency of deep neural networks in practical classification problems.